Random surfing model

The random surfing model is a graph model which describes the probability of a random user visiting a web page. The model attempts to predict the chance that a random internet surfer will arrive at a page by either clicking a link or by accessing the site directly, for example by directly entering the website's URL in the address bar. For this reason, an assumption is made that all users surfing the internet will eventually stop following links in favor of switching to another site completely. The model is similar to a Markov chain, where the chain's states are web pages the user lands on and transitions are equally probable links between these pages.

Description

A user navigates the internet in two primary ways; the user may access a site directly by entering the site's URL or clicking a bookmark, or the user may use a series of hyperlinks to get to the desired page. The random surfer model assumes that the link which the user selects next is picked at random. The model also assumes that the number of successive links is not infinite – the user will at some point lose interest and leave the current site for a completely new site.^[1]

The random surfer model is presented as a series of nodes which indicate web pages that can be accessed at random by users. A new node is added to the a graph when a new website is published. The movement about the graphs nodes is modeled by choosing a start node at random, then performing a short and random traversal of the nodes, or random walk. This traversal is analogous to a user accessing a website, then following hyperlink $t$ number of times, until the user either exits the page or accesses another site completely. Connections to other nodes in this graph are formed when outbound links are placed on the page.

Graph definitions

In the random surfing model, webgraphs are presented as a sequence of directed graphs $G_{t},t=1,2,\ldots$ such that a graph $G_{t}$ has $t$ vertices and $t$ edges. The process of defining graphs is parameterized with a probability $p$ , thus we let $q=1-p$ .^[2]

Nodes of the model arrive one at time, forming $k$ connections to the existing graph $G_{t}$ . In some models, connections represent directed edges, and in others, connections represent undirected edges. Models start with a single node $v_{0}$ and have $k$ self-loops. $v_{t}$ denotes a vertex added in the $t^{th}$ step, and $n$ denotes the total number of vertices.^[1]

Model 1. (1-step walk with self-loop)

At time $t$ , vertex $v_{t}$ makes $k$ connections by $k$ iterations of the following steps:

Pick an existing node $v$ uniformly at random from $\{v_{0},v_{1},\ldots ,v_{t-1}\}$
With probability $p$ stay at $v$ ; with probability $1-p$ take a 1-step walk to a random neighbor of $v$
Add an edge from $v_{t}$ to the current node

For directed graphs, edges added are directed from $v_{t}$ into the existing graph. Edges are undirected in respective undirected graphs.

Model 2. (Random walks with coin flips)

At time $t$ , vertex $v_{t}$ makes $k$ connections by $k$ iterations of the following steps:

Pick an existing node $v$ uniformly at random from $\{v_{0},v_{1},...,v_{t-1}\}$
Flip a coin of bias $p$
If the coin comes up heads add an edge from $v_{t}$ to the current node and stop
If the coin comes up tails, move to a random neighbor of the current node and go back to step 2

For directed graphs, edges added are directed from $v_{t}$ into the existing graph. Edges are undirected in respective undirected graphs.

Limitations

There are some caveats to the standard random surfer model, one of which is that the model ignores the content of the sites which users select – since the model assumes links are selected at random. Because users tend to have a goal in mind when surfing the internet, the content of the linked sites is a determining factor of whether or not the user will click a link.^[1]^[2]

Application

The normalized eigenvector centrality combined with random surfer model's assumption of random jumps created the foundation of Google's PageRank algorithm.^[2]^[3]

References

^ ^a ^b ^c Blum, Avrim; Chan, T-H. Hubert; Rwebangira, Mugizi Robert (21 January 2006). Written at 3600 University City Science Center Philadelphia, PA, United States. "A Random-Surfer Web-Graph Model" (PDF). Computer Science Department. ANALCO '06: Proceedings of the Meeting on Analytic Algorithmics and Combinatorics. Carnegie Mellon University: Society for Industrial and Applied Mathematics: 238–246.{{cite journal}}: CS1 maint: location (link)
^ ^a ^b ^c Chebolu, Prasad; Melsted, Páll (1 January 2008). "PageRank and the random surfer model" (PDF). Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Department of Mathematical Sciences, Carnegie Mellon University: 1010–1018.
^ Zaki, Mohammed J.; Meira, Jr., Wagner (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press. ISBN 9780521766333.

External links

Case study on random web surfers
Data Mining and Analysis: Fundamental Concepts and Algorithms is freely available to download for personal use here
Microsoft research on PageRank and the Random Surfer Model
Paper on how Google web search implements PageRank to find relevant search results

[:0-1] Blum, Avrim; Chan, T-H. Hubert; Rwebangira, Mugizi Robert (21 January 2006). Written at 3600 University City Science Center Philadelphia, PA, United States. "A Random-Surfer Web-Graph Model" (PDF). Computer Science Department. ANALCO '06: Proceedings of the Meeting on Analytic Algorithmics and Combinatorics. Carnegie Mellon University: Society for Industrial and Applied Mathematics: 238–246.{{cite journal}}: CS1 maint: location (link)

[:1-2] Chebolu, Prasad; Melsted, Páll (1 January 2008). "PageRank and the random surfer model" (PDF). Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Department of Mathematical Sciences, Carnegie Mellon University: 1010–1018.

[3] Zaki, Mohammed J.; Meira, Jr., Wagner (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press. ISBN 9780521766333.

[1]

[2]

[3]